Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions

نویسندگان

  • Samira Sheikhi
  • Jean-Marc Odobez
چکیده

The ability to recognize the Visual Focus of Attention (VFOA, i.e. what or whom a person is looking at) of people is important for robots or conversational agents interacting with multiple people, since it plays a key role in turn-taking, engagement or intention monitoring. As eye gaze estimation is often impossible to achieve, most systems currently rely on head pose as an approximation, creating ambiguities since the same head pose can be used to look at different VFOA targets. To address this challenge, we propose a dynamic Bayesian model for the VFOA recognition from head pose, where we make two main contributions. First, taking inspiration from behavioral models describing the relationships between the body, head and gaze orientations involved in gaze shifts, we propose novel gaze models that dynamically and more accurately predict the expected head orientation used for looking in a given gaze target direction. This is a neglected aspect of previous works but essential for recognition. Secondly, we propose to exploit the robot conversational state (when he speaks, objects to which he refers) as context to set appropriate priors on candidate VFOA targets and reduce the inherent VFOA ambiguities. Experiments on a public dataset where the humanoid robot NAO plays the role of an art guide and quiz master demonstrate the benefit of the two contributions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognizing the Visual Focus of Attention for Human Robot Interaction

We address the recognition of people’s visual focus of attention (VFOA), the discrete version of gaze that indicates who is looking at whom or what. As a good indicator of addressee-hood (who speaks to whom, and in particular is a person speaking to the robot) and of people’s interest, VFOA is an important cue for supporting dialog modelling in Human-Robot interactions involving multiple person...

متن کامل

استفاده از برآورد حالت‌های پویای دست مبتنی بر مدل، برای تقلید عملکرد بازوی انسان توسط ربات با داده‌های کینکت

Pose estimation is a process to identify how a human body and/or individual limbs are configured in a given scene. Hand pose estimation is an important research topic which has a variety of applications in human-computer interaction (HCI) scenarios, such as gesture recognition, animation synthesis and robot control. However, capturing the hand motion is quite a challenging task due to its high ...

متن کامل

The Vernissage Corpus: a Multimodal Human-robot-interaction Dataset

We introduce a new multimodal interaction dataset with extensive annotations in a conversational Human-RobotInteraction (HRI) scenario. It has been recorded and annotated to benchmark many relevant perceptual tasks, towards enabling a robot to converse with multiple humans, such as speaker localization, key word spotting, speech recognition in audio domain; tracking, pose estimation, nodding, v...

متن کامل

Robot Arm Performing Writing through Speech Recognition Using Dynamic Time Warping Algorithm

This paper aims to develop a writing robot by recognizing the speech signal from the user. The robot arm constructed mainly for the disabled people who can’t perform writing on their own. Here, dynamic time warping (DTW) algorithm is used to recognize the speech signal from the user. The action performed by the robot arm in the environment is done by reducing the redundancy which frequently fac...

متن کامل

Probabilistic Mapping of Human Visual Attention from Head Pose Estimation

Effective interaction between a human and a robot requires the bidirectional perception and interpretation of actions and behavior. While actions can be identified as a directly observable activity, this might not be sufficient to deduce actions in a scene. For example, orienting our face toward a book might suggest the action toward “reading.” For a human observer, this deduction requires the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 66  شماره 

صفحات  -

تاریخ انتشار 2015